BOUT++: a framework for parallel plasma fluid simulations 
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Abstract 

A new modular code called BOUT++ is presented, which simulates 3D fluid equations in curvilinear coordinates. 
Although aimed at simulating Edge Localised Modes (ELMs) in tokamak x-point geometry, the code is able to 
simulate a wide range of fluid models (magnetised and unmagnetised) involving an arbitrary number of scalar and 
vector fields, in a wide range of geometries. Time evolution is fully implicit, and 3 rd -order WENO schemes are 
implemented. Benchmarks are presented for linear and non-linear problems (the Orszag-Tang vortex) showing good 
i-C agreement. Performance of the code is tested by scaling with problem size and processor number, showing efficient 

scaling to thousands of processors. 

Linear initial-value simulations of ELMs using reduced ideal MHD are presented, and the results compared to 
the ELITE linear MHD eigenvalue code. The resulting mode-structures and growth-rate are found to be in good 
^ agreement (■jbout++ = 0.245^.4, jelite = 0.239lua, with Alfvenic timescale 1/loa = R/Va)- To our knowledge, 

this is the first time dissipationless, initial- value simulations of ELMs have been successfully demonstrated. 
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tokamak edge plasmas efficiently, the methods used 
are very general and can be adapted to many other 
situations: any coordinate system metric tensor 
gV = g 1 ! (x, y) (i.e. constant in one dimension) can 
be specified, which restricts the coordinate system 
to those with axi- or translationally symmetric ge- 
ometries. Even 2D metric tensors encompass a wide 
range of situations, allowing the code to be used 
to simulate plasmas in slab, sheared slab, cylindri- 
cal and non-orthogonal coordinate systems such as 
flux coordinates for tokamak simulations. Exten- 
sion of the code to allow 3D metric tensors would 
be relatively straightforward, but is not currently 
necessary for the problems considered here. 



1. Introduction 



> 



BOUT+- 1- is a new highly adaptable, object- 
oriented C++ code for performing parallel plasma 
fluid simulations with an arbitrary number of equa- 
tions in 3D curvilinear coordinates using finite- 
difference methods, ft has been developed from the 
original BOUndary Turbulence 3D 2-fluid tokamak 
edge simulation code BOUT |1I2I3I4|5I5] , borrowing 
ideas and algorithms, but has been substantially 
altered and extended. Though designed to simulate 
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BOUT++ is designed to automate the common 
tasks needed for fluid finite-difference simulation 
codes, and to separate the complicated (and error- 
prone) details such as differential geometry, parallel 
communication, and file input/output from the 
user-specified physics equations to be solved, whilst 
remaining as flexible as possible. Thus the physics 
equations being solved are clearly provided in one 
place, and can be easily changed with only minimal 
knowledge of the inner workings of the code. As far 
as possible, this allows the user to concentrate on the 
physics, rather than worrying about the numerics. 

1.1. Related work 

Several frameworks for parallel simulation al- 
ready exist, for example POOMA and Overture 
(both parts of the Advanced CompuTational Soft- 
ware (ACTS) collection |7J. These are very flexible 
and powerful, but still require significant knowledge 
of parallel computing, and investment of time, to 
produce a working simulation. BOUT++ provides 
a more complete framework, significantly reduc- 
ing the time needed and allowing quick testing of 
methods and equations. 

Very similar in spirit to BOUT++ is the Open- 
FOAM project |8l9j . This is an unstructured mesh 
finite-volume code, which also uses C++ features 
such as operator overloading to simplify the process 
of creating new simulations. The main distinction 
is that whereas OpenFOAM is designed to simu- 
late complex shaped domains in Cartesian coordi- 
nates, BOUT++ simulates relatively topologically 
simple domains in complicated coordinate systems. 
Therefore, OpenFOAM is more suited to engineer- 
ing applications such as simulation of internal com- 
bustion engines [TO], whilst BOUT++ is more suited 
to problems in physics such as simulation of toka- 
maks where non-Cartesian coordinate systems can 
be used to exploit anisotropics and so optimise nu- 
merical solution. 

In this paper the present state of the BOUT++ 
code is described, with the general layout of the 
code discussed in section [3] after a brief introduc- 
tion to the physics motivation in section[2] Details of 
the numerical methods implemented are described 
in section |3.6| A series of test problems are used to 
demonstrate the accuracy and flexibility of the code 
in section [4] Since the most immediate application 
of BOUT++ is to problems in plasma physics, the 
test problems are drawn from this field. Simulation 



of Edge Localised Modes (ELMs) and comparison 
with the ELITE linear MHD eigenvalue code |llll2j 
are presented in section [5] Finally, the performance 
of the BOUT++ code is demonstrated in section [6] 
showing efficient scaling with number of processors 
for a fixed problem size (hard scaling) to thousands 
of processors in section |6.2| 

2. Physics overview 

Edge Localised Modes in tokamaks are sudden 
(sub-millisecond) releases of particles and energy, 
resulting in the eruption of filamentary structures 
from the plasma edge. An image of one of these 
events from the Mega- Amp Spherical Tokamak [13] 
using a 10 /xs exposure time is shown in figure [T] The 




Fig. 1. Negative image of an Edge Localised Mode in the 
MAST Tokamak 13 , showing eruption of filaments from the 
plasma edge 

particles and energy released during these events are 
deposited on material surfaces and are potentially 
damaging in future fusion devices. There is there- 
fore interest in understanding and controlling these 
events. 

ELMs are found to be triggered close to the sta- 
bility boundary of an ideal magnetohydrodynamic 
(MHD) mode, called the peeling-ballooning mode 
|14ll2j . This provides strong evidence that this mode 
is involved in triggering an ELM. Peeling-ballooning 
modes are destabilised by a combination of pres- 
sure gradients (ballooning) and currents close to the 
plasma edge (peeling) [IS]. Further details of the lin- 
ear structure of peeling-ballooning modes are given 
in section [5] where BOUT++ simulations of this 
mode are discussed. Although there are analytic the- 
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ories |16j and semi-analytic models |17j for the early 
non-linear evolution of this mode, it is not yet fully 
understood how this develops into the filamentary 
structures observed, and ultimately how particles 
and energy are lost. 

Several 3D non-linear codes have been used to 
simulate ELMs, including NIMROD [18119120] , 
BOUT [21132] . JOREK [23], GEM [25125] and M3D 
[26|27j . These codes incorporate a wide range of 
physics including (in the case of BOUT and some 
NIMROD simulations e.g. [25]) 2-fluid effects. The 
approach employed with these codes is essentially 
to reproduce experimental observations and then 
to relate these results back to analytic theory. A 
complementary approach is to attack the problem 
from the other direction: starting from the analytic 
theory (i.e. ideal MHD models), gradually build 
complexity into the model in order to approach 
experimental results. The BOUT++ code is being 
developed and benchmarked to follow this second 
approach. For this reason the BOUT++ code has 
been designed to be very flexible in order to allow 
rapid prototyping of simulations involving different 
physical models. This is useful because it is not 
yet known what physical effects are essential to 
adequately simulate an ELM, or what numerical 
methods are most suited to the problem. 

3. Structure of the program 

The BOUT++ code can be separated into the 
following components: 

- Time integration using the Sundials CVODE 



package [7] (section 3.1l 



Input and output to the Portable Data Binary 
(PDB) format [25] (section [3~2 1. 
Low-level data handling: encapsulation of vari- 
ables into objects with associated operators (sec- 
tion 



3.31 



Parallel communications using the Message Pass- 



ing Interface (MPI) (section 3.4 1 
Coordinate system and differential operators (sec- 
tion 



3.51 



Differencing methods, both central and upwind- 
ing (section [3~6| 

The physics module which determines the equa- 



tions to be solved (section 3.7) 
Each of these components can be modified without 
altering the other modules, provided that the inter- 
face methods are the same. In particular, the physics 
module has been designed to be the easiest to re- 



place since this is the one most users will need to 
alter. We now detail each of these components, and 
the algorithms used. 

3.1. Time integration 

Like BOUT, BOUT++ is built upon the general 
implicit time-integration solver CVODE [7J. This 
is used as a "black-box" , requiring no information 
about the equations themselves, only the values of 
the time-derivatives given a state of the system. This 
is illustrated in figure [2j the state of the system at 
a given time / (t) is passed from the CVODE li- 
brary to BOUT++. From this, BOUT++ calculates 
the time-derivatives of all quantities df j dt which is 
passed back to CVODE. This independence of the 
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Fig. 2. Data flow between CVODE library and BOUT++ 
code 

equations to be solved makes CVODE an ideal start- 
ing point for creating a general simulation frame- 
work. 

To advance the system state in time, CVODE 
uses the Newton-Krylov BDF implicit method for 
stiff problems. To be efficient, this method requires 
preconditioning and this can optionally be supplied 
to the solver. It has been found however that for 
the simulations so-far attempted, this has not been 
necessary. The incorporation and exploration of 
physics-based preconditioners is planned as a future 
extension. 

The interface to CVODE is in C, and so this has 
been wrapped into a C++ class. In principle there- 
fore the solver could be replaced by a different pack- 
age without affecting the remaining code. 

3.2. Input and output 

Input to BOUT++ consists of two files: an op- 
tions text file, and a binary grid file. The options 
file format is the same as a windows INI file, con- 
sisting of a mixture of comments, section headers 
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and name = value pairs which makes the settings 
used for a given simulation clear. All aspects of a 
simulation can be set at run-time except the equa- 
tions solved which are set in a compiled physics 
module (section |3.7[ ) . This includes the number of 
steps, run-time limits, data and restart output pe- 
riod, differencing methods, field initialisation and 
boundary conditions. Use of compile-time options 
(#dcfine C preprocessor directives) tends to confuse 
which settings were used for a given simulation and 
so these are not used except for debugging options 



(section 3.3.2 ). Instead, by keeping all options in one 



file and assigning default values to new options, sim- 
ulations can be easily repeated at a later time even 
if the code has changed internally. 

Binary data input and output (grid input, data 
and restart file output) are in the Portable Data 
Binary (PDB) format For pre- and post- 

processing of input and output files C, FORTRAN 
and python bindings are supplied as part of the 
Portable Application Code Toolkit [39], and an 
IDL library has been separately developed. IDL 
and python bindings in particular, enable fast de- 
velopment of interactive codes to view and analyse 
results. Future development includes the possibility 
of moving to a more widely used binary format such 
as netCDF [3UI3T] or HDF5 [32]. 



3.3. Data handling 

The simplest part of a simulation code is the han- 
dling of data storage and manipulation, but is also 
time-consuming and error-prone. In BOUT++ this 
is handled by a set of classes which manage all mem- 
ory management and looping over domain indices, 
allowing the remainder of the code to be written in 
a much more concise manner. Operator overload- 
ing allows 3D scalar and vector fields to be treated 
as simple variables, eliminating some common bugs 
due to mis-typing array indices, and making the 
source code much easier to read. 

Several data classes have been implemented: 3D 
scalar and vector fields, and axisymmetric (2D) 
scalar and vector fields which are constant in the z 
coordinate and are useful for tokamak simulations 
because all equilibrium quantities are axisymmetric 
in toroidal angle (see section 3.5.11. Scalar over- 



loaded operations include addition, multiplication, 
exponentiation by real values or scalar fields. For 
vector fields, the dot and cross products are repre- 
sented by * and Asymbols. For example, the follow- 



ing examples are all valid operations on scalar fields 
a,b and c, and vector fields x and y: 
Scalar 3D a, b, c; // 3D scalar fields 
real r; 

a = b + c; a = b-c; // Addition & Subtraction 
a = b*c;a = r*b; // Multiplication 

a = b/ c;a = b/r;a = r/ b; // Division 
a = b~c;a = b~r;a = r~b; // Exponentiation 

Vector3D x, y, z; // 3D vector fields 

x = y * a; // Multiplication by scalar field 

a = x * y // Dot-product 
x = y " z // Cross-product 

For both scalar and vector field operations, so long 
as the result of an operation is of the correct type, the 
usual C/C++ shorthand notation can be used (i.e. 
a *= b is equivalent to a = a * b). These operations 
can of course be combined into statements such as 
a = 4*b + (cA2). A complication is that in C++ 
the A operator has lower precedence than the * or + 
operators and so exponentiation and cross-product 
operations must be put in brackets. 

In addition to arithmetic operations, standard 
mathematical functions such as sqrtO and abs() 
are also overloaded. This allows all operations on 
scalar and vector fields to be written very clearly 
and concisely. 



3.3.1. Optimisation 

In most cases, a hand-coded, specialised program 
will execute faster than a more flexible code. Since 
flexibility is an aim of BOUT++, and performance 
is a concern for large-scale simulations, this must be 
addressed. A famous quote by Donald Knuth goes 
"We should forget about small efficiencies, say about 
97% of the time: premature optimisation is the root 
of all evil." [33], i.e. performance should not be the 
guiding principle in designing a code. This is be- 
cause optimisations treat special cases, making code 
less clear and bugs harder to find. Whilst develop- 
ing BOUT++ it has been generally found that high- 
level algorithms have a greater effect on execution 
times than low-level operations. In this case a small 
performance penalty is worthwhile because the flex- 
ibility gained allows faster development of high-level 
algorithms. 

In optimising BOUT++, bottlenecks have been 
identified using profiling tools, and optimisations 
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made where these did not adversely affect the clar- 
ity of the code. Two optimisations used in the data 
objects to speed up code execution are memory re- 
cycling, which eliminates allocation and freeing of 
memory; and copy-on-change, which minimises un- 
necessary copying of data. 

Both of these optimisations are done "behind the 
scenes" , hidden from the remainder of the code, and 
are illustrated in figure [3] The objects (A,B,C) ac- 



Scalar field objects 




b 



Interface layer 



Data layer- 



Stack of memory blocks Associated memory blocks 

Fig. 3. Memory handling in BOUT++ 

cessed by the user in operations discussed in the 
previous section act as an interface to underlying 
data (a,b). Memory recycling can be used because 
all the scalar fields are the same size (and vector 
fields arc implemented as a set of 3 scalar fields). 
Each class implements a global stack of available 
memory blocks. When an object is assigned a value, 
it attempts to grab one of these memory blocks, and 
if none are available then a new block is allocated. 
When an object is destroyed, its memory block is not 
freed, but is put onto the stack. Since the evaluation 
of the time-derivatives involves the same set of op- 
erations each time, this system means that memory 
is only allocated the first time the time-derivatives 
are calculated, after which the same memory blocks 
are re-used. This eliminates the often slow system 
calls needed to allocate and free memory, replacing 
them with fast pointer manipulation. 

Copy-on-change (reference counting) further re- 
duces memory usage and unnecessary copying of 
data. When one field is set equal to another (e.g. 
Scalar 3D A = B in figure |3j, no data is copied, only 
the reference to the underlying data (in this case 
both A and B point to data block a) . Only when one 
of these objects is modified is a second memory block 
used to store the different value. This is particularly 
useful when returning objects from a routine. Usu- 
ally this would involve copying data from one object 
to another, and then destroying the original copy. 
Using reference counting this copying is eliminated. 



3.3.2. Debugging support 

Several features are built into the BOUT++ data 
objects which aid debugging, and can be enabled 
at compile-time: run-time checking, operation and 
variable tracking, and segmentation fault handling. 
Run-time checking tests the result of every opera- 
tion (or subsets, depending on checking level) for 
non-finite results (NaN, Inf ), stopping with an error 
message when such a value is encountered. In order 
to help locate where these values have occurred, an 
additional compile flag can be used to enable track- 
ing of operations: variables can be assigned names, 
and the result of an operation on two fields is given 
a name based on the input names. For example, the 
result of A+B would be given the name "(A+B)", 
and similarly for all other operations and differential 
operators. Thus, when an error occurs the name of 
the variables involved can be printed; for example an 
error might read "Scalar 3D: Non-finite number 
at [12] [2] [10] in 'sqrt(a - b) "'. These options 
can be used for initial testing of a module, and then 
disabled for long production simulations. 

Tracking down bugs in a large code like BOUT++ 
can be very tricky, particularly for intermittent 
problems such as segmentation faults. This is be- 
cause these can be impractical to reproduce running 
under a debugger due to the run-time, and may 
even not occur under a debugger due to the dif- 
ferent memory layout. Finding where a bug occurs 
can therefore take a lot of trial-and-error. To help 
catch this kind of bug, a fast message stack has been 
implemented in BOUT++: the idea is that at the 
start of every function (or part of a function) which 
may cause faults a message is put in a stack, and 
then removed once the function finishes. If an error 
is found - either from non-finite number checking 
or segmentation fault - the message stack is printed 
out, giving a good idea of where the error occurred. 
The following is an example of a segmentation 
fault deliberately triggered in a parallel derivative 
operation in the RHS function: 

****** SEGMENTATION FAULT CAUGHT ****** 



====== Back trace ====== 

-> Grad_par( Scalar 3D ) 

-> Running RHS: Solver: : rhs (0 . 000000e+00) 
-> Initialising solver 

The penalty for enabling run-time checking for 
most operations, and the message stack above is an 
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increase in run-time of 10-15%. For most simula- 
tions so far performed, this computational cost has 
been acceptable, and so checking was enabled for all 
calculations presented here, including scaling runs 
in section [6] For longer simulations once a code has 
been thoroughly tested, the cost of run-time check- 
ing may become problematic. For these runs, all 
checking can be disabled. 



implement moving meshes by evolving the metric 
tensor, although this has not yet been attempted. 

Currently the coordinate system is restricted to 
having one symmetry direction [z), so that the met- 
ric tensor components are 2D fields. Changing this 
to allow fully 3D metric tensors would be straight- 
forward (and is planned as a future option), but is 
not currently necessary for tokamak simulations. 



3.4. Communication and topology 

Though parallel communication could be handled 
transparently by the data objects, there are several 
potential efficiency improvements which would be 
difficult to automate, such as overlapping commu- 
nication with calculations. For this reason, parallel 
communications in BOUT++ are handled by a sep- 
arate object to provide this flexibility. Field objects 
are grouped into communicator objects which are 
then run to perform the communications. Fields can 
therefore be grouped into several communicator ob- 
jects, performing communication at different times. 

Domain decomposition is in two dimensions {x 
and y), and is currently done as a regular grid: the 
number of points in each dimension is the same on 
each processor. Decomposition in z is a possible fu- 
ture extension, but is complicated because FFTs are 
used in this dimension for Laplacian inversion (sec- 
tion 3.6.1 1 and the shifting needed in tokamak field- 



aligned coordinates (section 3.5.1 1 



Topology is handled internally in the communica- 
tion object, using branch-cuts specified in the grid 
input file. This is important in, for example, simula- 
tions of tokamaks in x-point geometry where the do- 
main is not topologically rectangular. Within each 
processor's domain the grid is topologically rectan- 
gular, simplifying differencing methods, so branch- 
cuts must coincide with processor boundaries. 

3.5. Coordinate system 

Coordinate systems are implemented by using 
global scalar field objects for each component of the 
metric tensors g lJ and g^j, and Christoffel symbol 
r* fc components calculated from these. All dif- 



ferential operators (section 3.6.1) then use these 
quantities. 

When a grid file is loaded, these quantities are 
read if they are present, otherwise they can be set 
in the physics module. Since metric tensor quanti- 
ties are not fixed, this could in principle be used to 



3.5.1. Tokamak coordinate systems 

An important class of instabilities in tokamaks 
produces structures which are highly elongated 
along magnetic field-lines (fc|| << k±, where k\\ 
and are wavenumbers parallel and perpendic- 
ular to the magnetic field respectively). Aligning 
the computational mesh along the magnetic field 
therefore allows far fewer grid-points to be used 
in this direction, with a corresponding reduction 
in the computational cost of a simulation. Due to 
the periodicity and complications introduced by 
magnetic shear, some special features have been 
implemented to handle these coordinate systems 
which arc discussed here. 




Fig. 4. Diagram of field-lines and flux-surfaces in a tokamak. 
Major radius R, height Z, poloidal flux poloidal angle 6, 
and toroidal angle £. 

The derivation of the field-aligned coordinate 
system starts with an orthogonal toroidal coordi- 
nate system (ip, 9, £), illustrated in figure|4] ip is the 
poloidal flux, 9 the poloidal angle (from to 2n), 
and C the toroidal angle (also to 2ir). 

Aligning the mesh with the equilibrium magnetic 
field, grid-points are placed in a coordinate system 
defined by p]: 



(i 
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Where v is the local field-line pitch given by 
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where h$ — 1/ |V0| is the 6 scale factor. The con- 
travariant basis vectors are therefore 
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y = yo 



y = yo + 2n 





2%q (VO / 







Wx = Vip Vy = V0 

vz = vc - is/ip ~ v (tp, e) ve 

where I = J g dv(ip, 9) jdtydd is the integrated local 
shear. Physically, different flux surfaces are labelled 
by x, while different field lines on a flux surface are 
labelled by z (i.e. B • Vz = 0). The covariant basis 
vector (the vector between grid-points) is therefore: 



1 



RB, 

_ he_ 

z =Re c 



+ IRe^ 
B — hgeg + vR&q 



(3) 



where e are the unit vectors in the original orthog- 
onal toroidal (t/j, 6, £) coordinate system. 

The 9 periodicity now becomes a boundary con- 
dition on y: after a full poloidal circuit the mesh has 
now been shifted toroidally by 2nq radians where 
q(ip) — (1/2%) § vdd is the standard "safety fac- 
tor" . This shifted mesh must then be mapped 
onto the original mesh using toroidal periodicity at 
a fixed poloidal location (called the twist-shift con- 
dition pQ). 

The twist-shift condition guarantees (poloidal) 
periodicity of field values on the grid, but not of ra- 
dial derivatives. This is due to the magnetic shear, 
the variation in safety factor with flux-surface: fol- 
lowing a bundle of field-lines around the torus, it 
becomes sheared as the field-lines on one surface 
have a different pitch to those on another. When 
this field-line bundle completes one poloidal circuit 
of the torus and is connected back onto itself (peri- 
odicity constraint), shear produces a discontinuity 
in the radial derivative. This can be seen in the ra- 
dial covariant basis vector e x (equation [3]) and illus- 
trated in figure |5j at y = yo, 1 = and e x is in the 
Vf/> direction, whereas at y — ya + 2ir, 1^0 and so 
the coordinate system has a discontinuity. For differ- 
encing methods this corresponds to using a stencil 



Fig. 5. Radial derivatives in a field-aligned coordinate sys- 
tem, showing how 4 mesh points map as one travels once 
around the 8 (poloidal) direction 

which is discontinuous in space across this match- 
ing location yo- This "special" poloidal location yo 
is unphysical, and is often the source of numerical 
instability. 

The solution to this problem which is used in 
BOUT++ for tokamak simulations (section [5]) , is 
to use "quasi-ballooning" coordinates given in |35j . 
similar to the scheme used by the GEM codes [21] . 
Calculation of differentials are performed in a patch- 
work of local coordinate systems, in which the e x 
basis vector is in the direction i.e. the / term 
in equation[3]is dropped. This coordinate system no 
longer contains a "special" poloidal location, but in- 
stead flux-surfaces slide past each other. In general, 
grid-points will not be aligned in the V?/ 1 direction, 
and so interpolation in the toroidal (£) direction is 
required to perform x derivatives. Since the domain 
is periodic in this z, FFTs are used to perform this 
interpolation. 

Existing codes employing this type of shifted co- 
ordinate system solve scalar equations such as re- 
duced MHD or gyro- fluid (in the case of GEM) equa- 
tions. A complication arises however for vector equa- 
tions because the local coordinate system is non- 
commutative, and so cannot be fully described by a 
metric tensor: moving in the e^, direction and then 
along the e y (B field) direction is different to moving 
in b v then directions, due to magnetic shear. In 
differential geometry this is called a coordinate sys- 
tem torsion . In this case there is a change in the 
curl operator, which must now include a term due 
to this shifting between coordinate systems, propor- 
tional to the magnetic shear: 



VxA^VxA 



i.e. 



1 du 



A z e 2 



(4) 
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1 dv 
y/gdip 



Note that this term is not a physical effect, but is an 
artifact of using a non-commutative set of coordi- 
nates for differencing. In most tokamak simulations 
this term is expected to be small, but should be con- 
sidered. 

Both the ballooning (x,y,z) coordinates, and 
shifted quasi-ballooning (ip,y,z) coordinates have 
problems in handling magnetic shear. Quasi- 
ballooning coordinates are used in BOUT++ sim- 
ulations (though both systems can be employed) 
because non-commutativity is preferable to the 
introduction of a special poloidal location and nu- 
merical instability. 

3.6. Differencing methods 

BOUT++ is a finite-differencing code using 
Method of Lines (MOL), so that differentials are 
calculated in each dimension separately. Because 
of this, differential operators can be split into two 
components: the calculation of partial derivatives 
(e.g. d/dx) on the grid, and the use of these func- 
tions to implement differential operators using a 
specified metric tensor. 

The choice of differencing methods to be used is 
quite problem-specific, and so can be changed at 
run-time in the input file. Methods can be specified 
for central derivatives (first and second derivatives) , 
and upwinding in each dimension separately. Cur- 
rently the methods implemented for central deriva- 
tives are 2 nd order, 4 th order, Central Weighted 
Essentially Non Oscillatory (CWENO) [57155] and 
derivatives using the Fast Fourier Transform (FFT) 
in the z (axisymmetric) dimension. 

Advection terms require special treatment and so 
the following schemes are currently implemented: 
first order upwinding, and 3rd-order Weighted Es- 
sentially Non Oscillatory (WENO) [55130] . WENO 
methods provide high accuracy, whilst remaining 
well-behaved at steep gradients such as shocks, and 
this scheme has been used in all the tests presented 
in section 3] and ELM simulations in section |U 

3.6.1. Operators 

Differential operators use the differencing meth- 
ods specified in the input option file, and metric ten- 
sor components from the grid input file. Operators 
are divided into two classes: those which are inde- 
pendent of any coordinate system, and those which 



are intended for use in a Clebsch coordinate system 
where B = Vx x Vz, i.e with B aligned with the 
y coordinate. Because different numerical methods 
are needed for upwinding terms, the operation v-V/ 
has a special function V_dot_Grad(v, f ). 



V 


- v/ 


Vector 


= Grad (Scalar) 




/ 


= V a 


Scalar 


= Div(Vector) 




V 


= V x a 


Vector 


= Curl (Vector) 




/ 


= v-V<? 


Scalar 


= V_dot_Grad (Vector , 


Scalar) 


V 


= a Vb 


Vector 


= V_dot_Grad (Vector , 


Vector) 


/ 


- v 2 / 


Scalar 


= Laplacian (Scalar) 





These are operators which assume that the equi- 
librium magnetic field is written in Clebsch form as 



B = Vz x Vx \B \ ■■ 
These operators include: 



a(J = b • v 

/ = b • V(p x VA 



Scalar = Grad_par (Scalar) 
Scalar = Div_par (Scalar) 
Scalar = 



bOxGrad_dot_Grad (Scalar , Scalar) 

A common problem encountered in plasma fluid 
simulations is to invert an equation of the form: 



V 2 x 



= b 



With the operator V_l = V - b (b • V) = 
— bx (bxV). This operator appears in reduced 
MHD for the vorticity inversion, and is used in 
many of the tests described in section [4] and ELM 
simulations in section [5] Efficiently inverting this 
operator is done in the same way as in the BOUT 
code: FFTs are used in the z direction to transform 
this problem into a set of ID inversion problems 
(in x) for each Fourier mode. These inversion prob- 
lems are band-diagonal (tri-diagonal in the case of 
2nd-ordcr differencing) and so inversions are very 
efficient: 0(n z \ogn z ) for the FFTs, O (n x ) for 
tridiagonal inversion using the Thomas algorithm 
[41 j . where n x and n z are the number of grid-points 
in the x and z directions respectively. 
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3.7. Physics module 

This module determines the actual equations 
solved by BOUT++, and is the only part of 
BOUT++ which 'knows' what the variables phys- 
ically mean. Physics modules have to implement 
two functions: physics_init, which is called once 
at the start of the run and initialises variables, and 
physics_run which is called every time-step, and 
calculates time-derivatives for a given state (see fig- 
ure [2j section 3.1 1. To illustrate the clarity possible 
with BOUT++, the equations of ideal MHD and 
the corresponding lines of code are shown in table [T] 

Table 1 

Comparison of analytic Ideal MHD expressions, and the cor- 
responding BOUT-I — h code 





Analytic 






BOUT++ 


dn/dt 






Scalar 3D 


dndt = 




—v • Vn 






- V_dot_Grad(v, n) 




— nV ■ v 






- n*Div(v) 


dp/dt 






Scalar 3D 


dpdt = 




—v ■ Vp 






- V_dot_Grad(v, p) 




— 7pV ■ v 






- gamma*p*Div(v) 


dv/dt 






Vector3D 


dvdt = 




-v • Vv 






- V_dot_Grad(v, v) 




— Vp/n 






- Grad(p)/n 




+ 1 (V x 


B) x B 




+ (l/n)*(Curl(B)AB) 


dB/dt 






Vector3D 


dBdt = 




V x (v X 


B) 




Curl(vAB) 



In addition to the evolution equations, some ini- 
tialisation code is needed to set up the simulation 
problem. This initialisation function physics_init 
consists of 

- Definition of fields to store state and time- 
derivatives (declared as global variables) 

- Loading initial profiles 

- Calls to specify which fields to use for state and 
time-derivatives. 

- Creation of a communications object, and speci- 
fication of the fields to communicate (optional) 

- Addition of extra variables to the output files (op- 
tional) 

An important component of the problem speci- 
fication is the boundary conditions. In BOUT++, 
the boundary conditions for each evolving variable 



ary conditions on a scalar field / include zero- value, 
zero-gradient, Vj_/ = 0, (anti-) symmetric. Gener- 
alised implementation of more complicated coupled 
boundary conditions is a possible future extension. 

Putting all the problem-specific code in one place 
allows a user to quickly verify the equations being 
solved, and to quickly implement new physical mod- 
els. In the next section test problems using a variety 
of physics modules are presented. 



4. Test problems 

Three test problems are presented here, the first 
two of which were published in [42 , and have also 
been used to benchmark the BOUT code [15]: the 
resistive drift instability (section |4.1| tests the fi- 
delity with which the code simulates wave propa- 
gation and in particular wave phase shifts. An in- 



can be set in the input settings file (section 3.2), al- 
lowing the effect of changing boundary conditions 
to be quickly assessed. Currently, possible bound- 



terchange mode in a curved slab (section 4.2) is a 
simplified form of the ballooning mode, and so re- 
covering the growth-rate of this mode is important 
for the later ELM simulations. Finally, the Orszag- 
Tang vortex problem in ideal MHD is simulated in 
section |4.3| This tests the numerical stability and 
accuracy of BOUT++ in simulating shocks, which 
is potentially important for non-linear ELM simula- 
tions. 

4.1. Resistive drift-wave instability 

A drift-wave is a wave which exists in a plasma 
wherever there is a pressure gradient |44j . Without 
dissipation, fluctuations in density n and electro- 
static potential cj) are in phase so there is no trans- 
port of plasma and the wave amplitude does not 
grow. Dissipation, in this case resistivity, introduces 
a phase-shift between n and (f> and hence transport 
of plasma and growth of the mode. Since all that 
is required for radial transport is a pressure gradi- 
ent and some form of dissipation (in the absence of 
magnetic shear) , this is often called the "universal" 
instability. Because the growth of the resistive drift- 
wave instability is sensitive to phase shifts, this test 
checks how accurately this phase is simulated. 

The equations solved are for the density n, and 
vorticity lo — n^a • V x v. The simulation is electro- 
static, and the zero electron mass approximation is 
used to obtain the parallel current j\ | . All quantities 
with a '0' subscript are equilibrium and not evolved. 



dn 
dt 
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duJ flg 

V E = ^-b x V_l0 

J|| =o"|| (T 9||n- n Q 9||</>) 

The simulation domain is a cylindrical annulus with 
radius i? = 5.4 m, radial width 6 cm and constant 
density scale- length Ln = 4.5 m. This is a 2D pe- 
riodic simulation domain, but since perpendicular 
wavenumber is fixed in a given simulation, the simu- 
lation is effectively ID. Radial boundary conditions 
are zero-gradient vorticity and density, and = 0. 
The analytic dispersion relation is (cu — oj* ) im | + 
= 0, with diamagnetic frequency u>* — kTeo/L^ 
Figure [6] shows the analytic growth rate and 
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Fig. 6. Resistive Drift wave instability test. Dashed lines are 
analytic results, diamonds from BOUT-I — h simulations 



4.2. Interchange mode 

The interchange mode is an instability driven by 
pressure gradients and curvature, and has some fea- 
tures in common with a ballooning mode. This test 
is therefore a highly simplified version of the ELM 
problem simulated in section [5j 

The simulation domain is a curved slab with ra- 
dius of curvature R, periodic in z and with zero- 
gradient boundary conditions in x and y |42j . As 
with the drift instability test, this domain is 2D, but 
the wave-number is fixed in one of these dimensions. 
The equations solved are for density n and vorticity 

UK 



-V E ■ Vn 

2uj a (b x k ) ■ Vp 



On 

at 
~dt 

V E = b x V_l 
p = 2Tqu 

with the magnetic field curvature vector ko = bo • 
Vbo ~ l/R. The density gradient is in the x direc- 
tion with a length-scale of 2 cm, and the tempera- 
ture To is a constant. 
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Fig. 7. Interchange instability test. Solid lines are from an- 
alytic theory, symbols from BOUTH — h simulations, and the 
RMS density is averaged over z. Vertical dashed line marks 
the reference point, where analytic and simulation results 
are set equal 



real frequency for this mode (dashed line), and 
the BOUT++ results (diamonds). As the parallel 
conductivity <7m is varied there is a peak in the 
growth-rate, the location of which is recovered well 
by BOUT-I— I- , indicating that wave phases are ac- 
curately simulated. 



Figure [TJshows the time-dependence of density for 
two cases with R = 1 and 10 metres. This shows 
that the growth-rate (slope of each line) is well re- 
produced in both cases, giving some confidence in 
the simulation of ELMs to be discussed in section [H 
In addition, this growth-rate is maintained over a 
long period (mode amplitude increases by 8 orders 
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of magnitude in the case of R = 1 m) without noise 
or other numerical problems significantly affecting 
the result. 



4.3. Orszag-Tang vortex 



This is a standard test problem for multi- 
dimensional MHD codes which tests how robust a 
scheme is to the formation of MHD shocks, and the 
accuracy with which the V • B = condition is pre- 
served |45I46] . The equations solved are ideal MHD: 



dn 

dt 
dp 

dt 
dv 

~dt 
OB 

~dt 



= —v • Vn — nV • v 



= —v • Vp - 
= -v • Vv 



7pV • v 
1 



[(V x B) x B - Vp] 



= V x (v x B) 



in a periodic 2D domain with sides of length 1 . Mass 
density p = 25/(367r) and pressure p = 5/(12n) are 
uniform (with sound speed Cs — 1), and the initial 
conditions for velocity and magnetic field are: 



v 0, y) 



B (x,y) 



-sin (2iry) , sin (2ttx)] 
-sin (2iry) , sin (4ttx)] 



The simulation results at t = 0.5 shown in fig- 



8(a) agrees qualitatively with those from ideal 



MHD codes such as ATH ENA |47) . The divergence 
of B is shown in figure |8(b)| indicating that the 
formation of shocks leads to an increase in |V ■ B|. 
At the time shown in figure |8(a) t = 0.5 (vertical 
dashed line in figure |8(b)| >, |V • B| = 1.5 x 10~ 12 for 
a 128 x 128 mesh, 4.6 x 10~ 12 on a 256 x 256 grid 
and 1.8 x 10~ n on a 512 x 512 grid. The large-scale 
B/L values are ~ 10 _1 , demonstrating that the nu- 
merical methods used can maintain the V • B = 
condition to high accuracy. 

Note that the simulation progresses well beyond 



the point shown in figure 8(a) as can be seen in 
figure |8(b)| where the simulation runs to t = 10, with 
only very slow increase in V • B. 

A more quantitative comparison with other codes 
is to take slices through this solution at y = 0.3125 
and y — 0.4277, shown in figure [9] These can be 
compared with figure 11 in [5S], figure 3 in [4U| . 
and figure 9 in [50] . Figure [9] shows that whilst the 



(a) Pressure at t = 0.5 



2.5 r 

2.0 ; 

2 1.5 - 

l : 

x l.o H 



0.5 J 



4 




RMS (V • B) 



4 6 
Time 



10 



(b) Divergence of B: maximum (solid) and RMS 
(dashed) 

Fig. 8. Orszag-Tang vortex test on a 256 X 256 grid 
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Fig. 9. Slices through the Orszag-Tang vortex solution for 
three different grid resolutions 

BOUT++ solutions are very close to those from spe- 
cialised MHD codes, it is susceptible to oscillations 
at the top of shocks. These oscillations do not grow 
as the simulation progresses, and improve as grid 
resolution is increased. They are present because 
many terms in the equations are solved using cen- 
tral methods (4 t?i -order central differencing) - only 
the upwinding terms use WENO type methods for 
differencing, and currently no flux-splitting is em- 
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ployed. 

Handling of shocks and sharp gradients in 
BOUT++ is currently acceptable, in that they do 
not produce numerical instabilities or unphysical 
values: the WENO advection maintains positive- 
definite density and pressure, and the solution in 
smooth regions is accurate. Future work includes 
improving the handling of shocks to remove the 
small unphysical oscillations observed in figure [9j 

5. ELM simulations 

The primary motivation for developing this code 
is the simulation of Edge Localised Modes (ELMs) in 
tokamaks (see section [2| . In this section we present 
linear benchmarking of BOUT++, comparing the 
results with the ELITE linear MHD eigenvalue code 
[11112) . The equations solved using BOUT++ arc 
high-/3 reduced ideal MHD [44 , evolving vorticity 
lu = b • V x v, pressure p, and parallel component 
of the vector potential A\\ — b • A : 



duj o. f J\\ 



2b x k ■ Vp 



OA 



II 



dt 
dp 

dt 

U! 



-V, 



B 



b x V<t> ■ Vp 



1 

So" 
J\\ = J\\o 
d d 



Mo 
1 



VIA,, 



dt dt Bo 



b x V0 • V 



where po is the mass density (which is a constant); 
J|| = b • J the parallel current; the electrostatic 
potential; k = b • Vb is the field curvature (as 



for the interchange test, section 4.2 1, and every- 



thing is in SI units. The perturbed magnetic field 
is given in terms of the parallel vector potential by 
Bi = VAm x bo- The vorticity equation includes the 
kink/peeling term through the perturbed magnetic 
field: 



b- V 



Bo 



= "o 



V-^b -VA|| 
a o 



x V 



h 

Bo 



Previously, time-evolution codes solving resistive 
and/or extended MHD have been used to simulate 
ELMs |18)19)20I21)22I23)26)27] . To our knowledge, 



these are the first ideal MHD time-dependent sim- 
ulations of ELMs: no dissipation is intentionally 
introduced, so the only dissipation present is nu- 
merical. 

Boundary conditions used for the simulations pre- 
sented here are: 

dP 



LO = 



V X A\\ = 



= 



on inner and outer radial boundaries. 



= 



The coordinate system used for these simulations 



is that given in section 3.5.1 a field-aligned (flux) 



coordinate system with shifted radial derivatives. 
Differencing methods used are 4*' i -order central dif- 
ferencing and 3 rd -order WENO advection scheme. 
Radial boundary conditions used are zero-gradient 
pressure perturbation, zero parallel current, and 
zero vorticity. For these simulations no X-point is 
included and so the domain is periodic in y (with a 
twist-shift condition, see section 3.5.1 1 and periodic 



in z (toroidal angle). For efficiency, when performing 
linear simulations of a single toroidal mode number 
n, only l/n th of the torus is simulated. Non-linear 
effects will couple toroidal mode-numbers, and so a 
larger fraction of the torus must then be simulated. 

5.1. Linear benchmarking 

In order to benchmark BOUT++ for this prob- 
lem, linear simulations have been performed and 
comparison made to the ELITE linear code |12)11] . 
The data shown here is from a grid with 256 radial 
points (x), 64 poloidal (y) and 16 toroidal (z). One 
difficulty in comparing linear MHD codes with time- 
evolving simulations is the treatment of the vacuum 
region surrounding the plasma: whereas linear codes 
can treat this region analytically using Green's func- 
tions, time-dependent codes must simulate quan- 
tities in this region and handle a moving plasma- 
vacuum interface. For the results presented here, no 
distinction is made between the plasma and vacuum 
regions, and the test case used is strongly balloon- 
ing (pressure-driven) rather than peeling (current- 
driven). This makes the solution relatively insensi- 
tive to the location of any plasma- vacuum interface 
and provides a simplified starting point for compar- 
ison. Inclusion of a vacuum region, possibly follow- 
ing the level-set methodology used in NIMROD [5T] , 
and simulation of the peeling instability is the sub- 
ject of future work. 

The BOUT++ initial perturbation has a single 
toroidal mode- number n — 20, but is not an eigen- 
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Fig. 10. Peak RMS pressure perturbation in BOUTH — h giv- 
ing a growth rate 7 = 0.245/uja- Comparison to ELITE 
growth-rate of 7 = 0.239/a;^ (dashed line) 

mode of the system, and so first evolves through 
a transient phase before settling on an eigenmodc 
with a single growth-rate. This can be seen in fig- 
ure [lO] which shows the time-evolution of the peak 
RMS pressure. The growth-rate the solution settles 
on is 7 = 0.245^, close to the ELITE result of 
0.239^^4 (shown as a dashed line in figur e [To| . These 
growth-rates (and the time axis of figure |10[ ) are nor- 
malised to an Alfven frequency loa = Va/R, with 
Va = Bq/ y/fiopo, and R the major radius. 

There are several differences between BOUT++ 
and ELITE which could explain the small growth- 
rate discrepancy, including the equations solved, 
and the handling of the vacuum region: both 
BOUT++ and ELITE solve a reduced form of 
MHD valid for high-n, and should be identical in 
the limit of toroidal mode number n — > 00, but 
ELITE uses the energy principle [H] rather than 
time-evolution, and so is reduced from MHD in a 
different way to BOUT++. Differences are there- 
fore expected for finite n. A major difference is in 
the handling of the vaccum region: whereas ELITE 
uses an analytic calculation for the vacuum contri- 
bution, currently BOUT++ treats the "vacuum" 
region as a low-pressure ideal plasma. Future work 
includes improving modelling of this vacuum re- 
gion, which is essential for the correct simulation of 
peeling modes. 

Linear MHD codes such as ELITE calculate the 
mode structure of an instability in terms of the dis- 
placement vector £, so that the plasma velocity is 
given by 7^. The radial component £^ is related to 
the electrostatic potential <j> calculated by BOUT++ 
through the Ex B velocity: 7^ = — VfixB/B 2 -e^. 
The conversion from <f> to £ is simplified because in 
ideal MHD the frequency of an unstable mode is en- 



tirely imaginary - there is no real frequency compo- 
nent. If diamagnetic or other non-ideal effects are 
included, then a real frequency component appears 
and must be taken into account. 
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Fig. 11. Poloidal slice through the tokamak, showing radial 
displacement £^ for toroidal mode number n = 20. 



Figure 11 shows the radial displacement £^ on a 
poloidal slice (ip,9) through the tokamak at fixed 
toroidal angle £ from both BOUT++ (left) and 
ELITE (right). Note that due to the field-aligned 
coordinate system, the BOUT++ 9 resolution at 



fixed C shown in figure 11 is much higher than the 
resolution in y: the number of y poloidal points 
used (64) is actually the number of points along 
a given field- line, rather than the resolution in 9. 
Instead, the 9 resolution of figure 11 is determined 
by the number of field- lines simulated (n z = 16), 
the number of times these are repeated to form a 
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torus (n = 20), and the number of times each field- 
line travels around the torus toroidally for each 
poloidal revolution (safety factor q). In this case, 
1.29 < q < 5.39 (depending on radial location), and 
so the 8 resolution is ng = qnn z ~ 414 — * 1726. 
This illustrates the advantages of using field-aligned 
coordinate systems. 

A commonly used, and more quantitative way to 
display the information in figure [TT] is in terms of 
the poloidal mode structure. This is the amplitude 
of poloidal Fourier modes, calculated by taking the 
Fourier transform in a poloidal angle x- 
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Fig. 13. Subset of poloidal modes in figurc [T2"] from BOUT-I — h 
simulation. Vertical lines mark the location of resonant mag- 
netic surfaces 
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Fig. 12. Mode structure for toroidal mode number n = 20 



Figure [T2]shows the mode-structure calculated by 
BOUT++ and ELITE for a test case with a toroidal 
mode number of n = 20. Each line in this figure rep- 
resents the magnitude of a poloidal harmonic against 
the normalised poloidal flux tp (ip — at magnetic 
axis, 1 at plasma edge). The BOUT++ domain cov- 
ers the range 0.6 < ip < 1.2, but is shown in figure 12 
on the same scale as ELITE for comparison. 



In both of these results the amplitude of a given 
poloidal mode peaks close to its resonant magnetic 
surface, as is expected from analytic theory. This 
is shown in figure [13] which shows a subset of the 
BOUT++ poloidal modes (every 2nd mode for clar- 
ity), with their resonant surfaces i.e. where q(ip) is 
rational. 

As expected for a time-dependent code, the 
BOUT++ result in figure [12] is less "clean" than 
the ELITE result, containing additional poloidal 
modes from the initial transient. These additional 
modes gradually reduce in amplitude relative to the 
main resonant modes. The mode envelope of the 
BOUT++ result in figure [12] is in good agreement 
with the ELITE result, but individual poloidal har- 
monics are slightly more peaked in the BOUT++ 
rcsult. The main difference is close to the plasma 
"boundary" at normalised ip = 1. This is likely to 
be because ELITE is treating the region beyond 
this point as a vacuum, whereas BOUT++ treats it 
as an ideal plasma. Future work will include using 
different models of this vacuum region to assess its 
impact. 

Linear time-dependent simulations using BOUT++ 
currently reproduce many of the features of peeling- 
ballooning modes expected from analytic theory. 
The growth-rate and mode-structure produced in 
the BOUT++ simulation is very close to that from 
the linear MHD code ELITE. This is a proof-of- 
principle which demonstrates that BOUT++ is 
capable of simulating the ideal ballooning mode 
correctly using reduced ideal MHD. Further bench- 
marking against ELITE for other mode-numbers 
and equilibria, and non-linear ELM simulations are 
the subject of a future paper. 



14 



6. Performance 

Although the priority for BOUT++ is flexibility, 
it is also aimed at performing large-scale (10 7 — 10 8 
variable) simulations. Therefore, speed of execution 
and scaling to large number of processors is also im- 



portant (see discussion in section |3.3.1 ) . Here scaling 
of BOUTH — h run-times with problem size and num- 
ber of processors (hard scaling) are described. Ex- 
cept where otherwise stated, the linear ELM prob- 
lem is used as the test case. 



0{n x ) ,. 
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(a) Scaling with x dimension. n y 
n z = 16 



6.1. Scaling with problem size 

Typically 80-90% of the wall-clock time is spent 
evaluating the time-derivatives calculated in the 
physics module, with the remainder of the time 
spent advancing the time in the implicit CVODE 
solver. Because the time-integration scheme used is 
implicit, the time-step is not limited by the fastest 
waves on the grid. Instead, the time-step is deter- 
mined by the accuracy and hence behaviour of the 
simulation. This makes scaling with problem size 
harder to quantify than an explicit code where the 
CFL condition determines the time-step. This test 
is therefore problem-dependent, and is intended as 
a guide to the performance of BOUT++. 

Figure [14] shows variation of run-times on a fixed 
number of processors with problem size in the x, y 
and z directions on a log-log scale. The test case used 
is the linear ELM simulation described in section l5~Tl 
Scaling with x and z domain size is approximately 
linear (dotted line) since in this case the dynamics 
in these directions are not limiting the time-step. 
Scaling in the y direction (figure |14(b)| is between 
O (n) and 0(n 2 ), since fast parallel dynamics do 
have an impact on the time-step. These tests show 
that the algorithms used are efficient (O (n)) where 
the grid-size does not affect the time-step, and that 
at worst the scaling with problem size is O (n 2 ). 

6.2. Scaling with number of processors 

Scaling of the BOUTH — h code over a varying num- 
ber of processors and fixed problem size (hard scal- 
ing) has been performed using the National Energy 
Research Scientific Computing (NERSC) Franklin 
machine. This is a Cray XT-4 with 9,660 nodes 
linked by SeaStar 2 interconnect. Each node con- 
sists of a dual-core 2.6GHz AMD Opteron, giving 
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Fig. 14. Run time scaling with problem size on fixed number 
of processors (4) 

a theoretical peak performance for Franklin of ap- 
proximately 101.5 TFlop/s. 

Scaling efficiency e on Np processors is given rel- 
ative to a reference number of processors Np using 
the run-time T (Np) as: 



e(N P ) = 100 



N P0 T(N P0 ) 
N P T (N P ) 



which gives an estimate of the percentage of CPU 
time used in solving the problem, rather than syn- 
cronising with other processors. 



Figure 15 shows the scaling of an ELM simulation 
solving 3 fields on a 256x256x128 grid (i.e 25,165,824 
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Fig. 15. Scaling of a 3D ELM simulation on a 256 X 256 X 128 
mesh. CPU usage is given for arithmetic and differencing op- 
crations(bluc); Laplacian inversion code (red); communica- 
tions not part of inversion (yellow) and CVODE the time-ad- 
vance code (green) 

evolving variables) on up to 8192 processors. This 
shows that for this problem 2048 processors can be 
used at approximately 50% efficiency relative to 32 
processors. The number of processors which can be 
efficiently employed is dependent on the problem 
size, so this is the approximate grid size anticipated 
for future non-linear ELM simulations. For the lin- 



ear ELM simulations in section 5.1 (256 x 64 x 16 
mesh), the calculation is 50% efficient for between 
256 and 512 processors. 

One of the bottlenecks in ELM simulations for 
large numbers of processors is the Laplacian inver- 
sion code: figure |15(b)| shows the number of CPU 
hours devoted to various parts of the code, showing 
that the time spent in this inversion code (red, dia- 
monds) becomes significant for large number of pro- 
cessors. Currently the Laplacian inversion algorithm 
is quite a simple adaptation of the Thomas serial 
tridiagonal inversion scheme. The calculation is par- 
allelised by performing inversion of Y slices simulta- 
neously, but efficiency will decline once the number 
of processors in X exceeds the number of poloidal 
(Y) points per processor: Px > Ny/Py i-e. P > 
Ny ■ This may explain the faster decline in efficiency 



for greater than 256 processors in figure 15(a)| 

This Laplacian inversion code was implemented 
because it performs exactly the same operations as 
the serial code, and so provides a good base case for 
testing. The Thomas algorithm is however notori- 
ously inefficient on parallel computers and several 
better algorithms exist and will be implemented in 
future. 

In order to test the efficiency of the code in the 
absence of Laplacian inversions, a scaling study has 
been done for the 2D Orszag-Tang vortex problem in 
section|4.3| Full ideal MHD on a 512 x 512 mesh with 



2,097,152 evolving variables. Figure 16(a) shows the 



140 
120 
100 
80 
60 
40 
20 




10 100 1,000 

Number of processors 

(a) Efficiency 



10, 000 




100 1,000 
Number of processors 

(b) CPU usage 



10,000 



Fig. 16. Scaling of 2D ideal MHD Orszag-Tang vortex prob- 
lem on a 512 X 512 mesh 

efficiency of this case (again relative to 32 proces- 
sors). In this case - despite having fewer evolving 
variables - scaling is over 50% efficient for 4096 pro- 
cessors. Note the increase in efficiency between 32 
and 1024. This may be due to the increasing amount 
of available cache: smaller number of grid-points per 
processor mean more data will fit into fast cache, 
reducing memory access times. This may indicate 
that memory access is taking up a significant part 
of the processing time, which would be expected to 
be more of a problem for BOUT++ than its more 
specialised predecessor BOUT since BOUT++ per- 
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forms a loop over grid-points for every operation, 
whereas BOUT loops over grid-points once. 

For this case the total number of CPU hours used 
for each task is shown in figure |16(b)| Communica- 
tion of variables (yellow) doesn't seem to be a signif- 
icant problem for any of these test cases; in this case 
the limiting factor appears to be CVODE (green). 
This may be related to the small number of vari- 
ables; it's probable that CVODE could use more 
processors efficiently given a larger problem to solve. 



many features of the ELM such as mode-structure 
and growth-rate very close to that produced by the 
ELITE linear MHD code. These promising results 
indicate that BOUT++ is capable of accurately 
simulating ELMs using reduced ideal MHD, the first 
time such dissipationless (apart from numerical) 
simulations have been done. An extended analysis 
of BOUT++ ELM simulations, comparison with 
ELITE, and non-linear behaviour is the subject of 
a future publication. 



7. Conclusions 



A new fluid simulation code BOUT++ has been 
developed and some tests presented. The code is 
very modular, allowing new features such as differ- 
encing methods to be quickly implemented. In par- 
ticular the fluid model solved can be easily changed 
to include an arbitrary number of scalar and vector 
fields. This allows BOUT++ to be used as a plat- 
form for quickly testing both new algorithms, and 
different physics models. 

Numerical methods currently included are a fully 



implicit solver (the CVODE library, section 3.1l 



and WENO schemes for handling shocks. The stabil- 
ity and accuracy of these schemes has been demon- 
strated using a series of linear and non-linear prob- 
lems (section]!]). Whilst the current implementation 
has been found to be stable in the presence of shocks, 
accuracy in the vicinity of shocks needs further im- 
provement. 

Increased flexibility often comes with a perfor- 
mance cost and so several optimisation strategies 
used in BOUT++ to reduce this penalty whilst 
retaining flexibility have been described in sec- 
tion |3 . 3 - 1 1 Scaling of run-times with problem size 
and processor number in section [6] show the effi- 
ciency of the algorithms (O (n) where the time-step 
is not impacted, worst case O (n 2 )). Hard scaling 
to thousands of processors has been demonstrated 
using NERSC's Franklin Cray XT4 machine. Areas 
for improvement have been highlighted, particularly 
the need for a faster parallel Laplacian inversion 
algorithm. 

The main motivation in developing this code is 
to simulate Edge Localised Modes (ELMs) in toka- 
maks, and so several features specific to tokamak 
geometry have been implemented such as shifted 
radial derivatives described in section f3 . 5 . 1 1 and the 
topology necessary for simulation of equilibria with 
x-points. Linear simulations of ELMs reproduce 
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